Goto

Collaborating Authors

 sparse network


The Graphon Limit Hypothesis: Understanding Neural Network Pruning via Infinite Width Analysis

Neural Information Processing Systems

Sparse neural networks promise efficiency, yet training them effectively remains a fundamental challenge. Despite advances in pruning methods that create sparse architectures, understanding why some sparse structures are better trainable than others with the same level of sparsity remains poorly understood. Aiming to develop a systematic approach to this fundamental problem, we propose a novel theoretical framework based on the theory of graph limits, particularly graphons, that characterizes sparse neural networks in the infinite-width regime. Our key insight is that connectivity patterns of sparse neural networks induced by pruning methods converge to specific graphons as networks' width tends to infinity, which encodes implicit structural biases of different pruning methods. We postulate the Graphon Limit Hypothesis and provide empirical evidence to support it. Leveraging this graphon representation, we derive a Graphon Neural Tangent Kernel (Graphon NTK) to study the training dynamics of sparse networks in the infinite width limit. Graphon NTK provides a general framework for the theoretical analysis of sparse networks. We empirically show that the spectral analysis of Graphon NTK correlates with observed training dynamics of sparse networks, explaining the varying convergence behaviours of different pruning methods. Our framework provides theoretical insights into the impact of connectivity patterns on the trainability of various sparse network architectures.


The Graphon Limit Hypothesis: Understanding Neural Network Pruning via Infinite Width Analysis

Neural Information Processing Systems

Sparse neural networks promise efficiency, yet training them effectively remains a fundamental challenge. Despite advances in pruning methods that create sparse architectures, understanding why some sparse structures are better trainable than others with the same level of sparsity remains poorly understood. Aiming to develop a systematic approach to this fundamental problem, we propose a novel theoretical framework based on the theory of graph limits, particularly graphons, that characterizes sparse neural networks in the infinite-width regime. Our key insight is that connectivity patterns of sparse neural networks induced by pruning methods converge to specific graphons as networks' width tends to infinity, which encodes implicit structural biases of different pruning methods. We postulate the and provide empirical evidence to support it. Leveraging this graphon representation, we derive a to study the training dynamics of sparse networks in the infinite width limit. Graphon NTK provides a general framework for the theoretical analysis of sparse networks. We empirically show that the spectral analysis of Graphon NTK correlates with observed training dynamics of sparse networks, explaining the varying convergence behaviours of different pruning methods. Our framework provides theoretical insights into the impact of connectivity patterns on the trainability of various sparse network architectures.


Sparse Winning Tickets are Data-Efficient Image Recognizers

Neural Information Processing Systems

Improving the performance of deep networks in data-limited regimes has warranted much attention. In this work, we empirically show that "winning tickets" (small subnetworks) obtained via magnitude pruning based on the lottery ticket hypothesis [1], apart from being sparse are also effective recognizers in data-limited regimes. Based on extensive experiments, we find that in low data regimes (datasets of 50-100 examples per class), sparse winning tickets substantially outperform the original dense networks. This approach, when combined with augmentations or fine-tuning from a self-supervised backbone network, shows further improvements in performance by as much as 16% (absolute) on low sample datasets and longtailed classification. Further, sparse winning tickets are more robust to synthetic noise and distribution shifts compared to their dense counterparts. Our analysis of winning tickets on small datasets indicates that, though sparse, the networks retain density in the initial layers and their representations are more generalizable.








Graph energy as a measure of community detectability in networks

arXiv.org Machine Learning

A key challenge in network science is the detection of communities, which are sets of nodes in a network that are densely connected internally but sparsely connected to the rest of the network. A fundamental result in community detection is the existence of a nontrivial threshold for community detectability on sparse graphs that are generated by the planted partition model (PPM). Below this so-called ``detectability limit'', no community-detection method can perform better than random chance. Spectral methods for community detection fail before this detectability limit because the eigenvalues corresponding to the eigenvectors that are relevant for community detection can be absorbed by the bulk of the spectrum. One can bypass the detectability problem by using special matrices, like the non-backtracking matrix, but this requires one to consider higher-dimensional matrices. In this paper, we show that the difference in graph energy between a PPM and an Erdős--Rényi (ER) network has a distinct transition at the detectability threshold even for the adjacency matrices of the underlying networks. The graph energy is based on the full spectrum of an adjacency matrix, so our result suggests that standard graph matrices still allow one to separate the parameter regions with detectable and undetectable communities.